Back

Ecological Informatics

Elsevier BV

All preprints, ranked by how well they match Ecological Informatics's content profile, based on 29 papers previously published here. The average preprint has a 0.03% match score for this journal, so anything above that is already an above-average fit. Older preprints may already have been published elsewhere.

1
A comparison of convolutional neural networks and few-shot learning in classifying long-tailed distributed tropical bird songs

Zhong, M.; LeBien, J.; Campos-Cerqueira, M.; Aide, T. M.; Miao, Z.; Dodhia, R.; Lavista Ferres, J.

2023-07-27 ecology 10.1101/2023.07.25.550590 medRxiv
Top 0.1%
43.5%
Show abstract

Biodiversity monitoring depends on reliable species identification, but it can often be difficult due to detectability or survey constraints, especially for rare and endangered species. Advances in bioacoustic monitoring and AI-assisted classification are improving our ability to carry out long-term studies, of a large proportion of the fauna, even in challenging environments, such as remote tropical rainforests. AI classifiers need training data, and this can be a challenge when working with tropical animal communities, which are characterized by high species richness but only a few common species and a long tail of rare species. Here we compare species identification results using two approaches: convolutional neural networks (CNN) and Siamese Neural Networks (SNN), a few-shot learning approach. The goal is to develop methodology that accurately identifies both common and rare species. To do this we collected more than 600 hours of audio recordings from Barro Colorado Island (BCI), Panama and we manually annotated calls from 101 bird species to create the training data set. More than 40% of the species had less than 100 annotated calls and some species had less than 10. The results showed that Siamese Networks outperformed the more widely used convolutional neural networks (CNN), especially when the number of annotated calls is low.

2
Biodiversity Image Quality Metadata Augments Convolutional Neural Network Classification of Fish Species

Leipzig, J.; Bakis, Y.; Wang, X.; Elhamod, M.; Diamond, K.; Dahdul, W.; Karpatne, A.; Maga, M.; Mabee, P.; Bart, H. L.; Greenberg, J.

2021-01-29 evolutionary biology 10.1101/2021.01.28.428644 medRxiv
Top 0.1%
33.0%
Show abstract

Biodiversity image repositories are crucial sources of training data for machine learning approaches to biological research. Metadata, specifically metadata about object quality, is putatively an important prerequisite to selecting sample subsets for these experiments. This study demonstrates the importance of image quality metadata to a species classification experiment involving a corpus of 1935 fish specimen images which were annotated with 22 metadata quality properties. A small subset of high quality images produced an F1 accuracy of 0.41 compared to 0.35 for a taxonomically matched subset of low quality images when used by a convolutional neural network approach to species identification. Using the full corpus of images revealed that image quality differed between correctly classified and misclassified images. We found the visibility of all anatomical features was the most important quality feature for classification accuracy. We suggest biodiversity image repositories consider adopting a minimal set of image quality metadata to support future machine learning projects.

3
Zero-shot animal behavior classification with vision-language foundation models

Dussert, G.; Miele, V.; Van Reeth, C.; Delestrade, A.; Dray, S.; Chamaille-Jammes, S.

2024-07-07 ecology 10.1101/2024.04.05.588078 medRxiv
Top 0.1%
31.9%
Show abstract

1. Understanding the behavior of animals in their natural habitats is critical to ecology and conservation. Camera traps are a powerful tool to collect such data with minimal disturbance. They however produce very a large quantity of images, which can make human-based annotation cumbersome or even impossible. While automated species identification with artificial intelligence has made impressive progress, automatic classification of animal behaviors in camera trap images remains a developing field. 2. Here, we explore the potential of foundation models, specifically Vision Language Models (VLMs), to perform this task without the need to first train a model, which would require some level of human-based annotation. Using an original dataset of alpine fauna with behaviors annotated by participatory science, we investigate the zero-shot capabilities of different kind of recent VLMs to predict behaviors and estimate behavior-specific diel activity patterns in three ungulate species. 3. Our results show that using these methods, it is possible to achieve accuracies over 91% in behavior classification and produce activity patterns that closely align with those derived from participatory science data (overlap indexes between 84% and 90%). 4. These findings demonstrate the potential of foundation models and vision-language models in ecological research. Ecologists are encouraged to adopt these new methods and leverage their full capabilities to facilitate ecological studies.

4
Location Invariant Animal Recognition UsingMixed Source Datasets and Deep Learning

Shepley, A. J.; Falzon, D. G.; Meek, P.; Kwan, P.

2020-05-15 ecology 10.1101/2020.05.13.094896 medRxiv
Top 0.1%
25.9%
Show abstract

O_LIA time-consuming challenge faced by camera trap practitioners all over the world is the extraction of meaningful data from images to inform ecological management. The primary methods of image processing used by practitioners includes manual analysis and citizen science. An increasingly popular alternative is automated image classification software. However, most automated solutions are not sufficiently robust to be deployed on a large scale. Key challenges include limited access to images for each species and lack of location invariance when transferring models between sites. This prevents optimal use of ecological data and results in significant expenditure of time and resources to annotate and retrain deep learning models. C_LIO_LIIn this study, we aimed to (a) assess the value of publicly available non-iconic FlickR images in the training of deep learning models for camera trap object detection, (b) develop an out-of-the-box location invariant automated camera trap image processing solution for ecologist using deep transfer learning and (c) explore the use of small subsets of camera trap images in optimisation of a FlickR trained deep learning model for high precision ecological object detection. C_LIO_LIWe collected and annotated a dataset of images of "pigs" (Sus scrofa and Phacochoerus africanus) from the consumer image sharing website FlickR. These images were used to achieve transfer learning using a RetinaNet model in the task of object detection. We compared the performance of this model to the performance of models trained on combinations of camera trap images obtained from five different projects, each characterised by 5 different geographical regions. Furthermore, we explored optimisation of the FlickR model via infusion of small subsets of camera trap images to increase robustness in difficult images. C_LIO_LIIn most cases, the mean Average Precision (mAP) of the FlickR trained model when tested on out of sample camera trap sites (67.21-91.92%) was significantly higher than the mAP achieved by models trained on only one geographical location (4.42-90.8%) and rivalled the mAP of models trained on mixed camera trap datasets (68.96-92.75%). The infusion of camera trap images into the FlickR training further improved AP by 5.10-22.32% to 83.60-97.02%. C_LIO_LIEcology researchers can use FlickR images in the training of automated deep learning solutions for camera trap image processing to significantly reduce time and resource expenditure by allowing the development of location invariant, highly robust out-of-the-box solutions. This would allow AI technologies to be deployed on a large scale in ecological applications. C_LI

5
Interpretable and Robust Machine Learning for Exploring and Classifying Soundscape Data

Omprakash, A.; Balakrishnan, R.; Ewers, R. M.; Sethi, S. S.

2024-11-08 ecology 10.1101/2024.11.07.622465 medRxiv
Top 0.1%
25.0%
Show abstract

The adoption of machine learning in Passive Acoustic Monitoring (PAM) has improved prediction accuracy for tasks like species-specific call detection and habitat quality estimation. However, these models often lack interpretability, and PAM generates vast amounts of non-informative data, as soundscapes are typically information sparse. Here, we developed ecologically interpretable methods that accurately predict land use from audio while filtering unwanted data. Audio from habitats in Southern India (evergreen forests, deciduous forests, scrublands, grasslands) was collected and categorised by land use (reference, disturbed, and agriculture). We used Gaussian Mixture Models (GMMs) on top of a Convolutional Neural Network (CNN)-based feature extractor to predict land use. Thresholding based on likelihood values from GMMs improved model accuracy by excluding uninformative data, enabling our method to outperform models such as Random Forests and Support Vector Machines. By analysing areas of acoustic feature space driving predictions, we identified "keystone" soundscape elements for each land use, including both biotic and anthropogenic sources. Our approach provides a novel method for ecologically meaningful interpretation and exploration of large acoustic datasets independent of specific feature extractors. Our study paves the way for soundscape monitoring to deliver robust and trustworthy habitat assessments on scales that would not otherwise be possible.

6
The Amazon rainforest soundscape characterized through Information Theory quantifiers

Colonna, J. G.; Carvalho, J. R.; Rosso, O. A.

2020-02-10 ecology 10.1101/2020.02.09.940916 medRxiv
Top 0.1%
23.2%
Show abstract

Automatic monitoring of biodiversity by acoustic sensors has become an indispensable tool to assess environmental stress at an early stage. Due to the difficulty in recognizing the Amazons high acoustic diversity and the large amounts of raw audio data recorded by the sensors, the labeling and manual inspection of this data is not feasible. Therefore, we propose an ecoacoustic index that allows us to quantify the complexity of an audio segment and correlate this measure with the biodiversity of the soundscape. The approach uses unsupervised methods to avoid the problem of labeling each species individually. The proposed index, named the Ecoacoustic Global Complexity Index (EGCI), makes use of Entropy, Divergence and Statistical Complexity. A distinguishing feature of this index is the mapping of each audio segment, including those of varied lenghts, as a single point in a 2D-plane, supporting us in understanding the ecoacoustic dynamics of the rainforest. The main results show a regularity in the ecoacoustic richness of a floodplain, considering different temporal granularities, be it between hours of the day or between consecutive days of the monitoring program. We observed that this regularity does a good job of characterizing the soundscape of the environmental protection area of Mamiraua, in the Amazon, differentiating between species richness and environmental phenomena.

7
A multimodal learning approach for automated detection of wildlife trade on social media

Momeny, M.; Kulkarni, R.; Soriano-Redondo, A.; Rinne, J.; Di Minin, E.

2025-09-29 ecology 10.1101/2025.09.24.678024 medRxiv
Top 0.1%
22.5%
Show abstract

Social media data and machine learning methods for automated content analysis are increasingly being used in ecology and conservation science. A current limitation is the lack of methods for automated multimodal analysis of textual and visual content among other data modalities. In this study, we introduce a multimodal content analysis method applied to the investigation of wildlife trade on YouTube. Our approach consists of analyzing text through transformer based neural networks and video keyframes using convolutional neural networks as part of multimodal filtering followed by classification where a decision fusion module identifies instances of wildlife trade. The decision fusion module achieved an F-score of 0.72 among textual classifiers for trade detection and of 0.77 among visual classifiers for species identification. This multimodal classification helped detect wildlife trade in 3,715 out of 86,321 filtered YouTube posts, featuring 226 species for sale, including 51 Critically Endangered, 62 Endangered, 60 Vulnerable, 25 Near Threatened, and 28 Least Concern species. The proposed multimodal learning methods can be used more broadly for other ecological and biodiversity conservation applications. The bigger pictureThe unsustainable trade in wildlife is a major driver of biodiversity loss, threatening thousands of species across the Tree of Life. While online platforms have become popular spaces for advertising wildlife and exotic pets for sale, monitoring these platforms remains extremely challenging. Traditional surveillance methods are not scalable, and automated tools have typically focused on either text or image analysis in isolation, limiting their effectiveness in identifying nuanced instances of wildlife trade. Our study introduces a multimodal machine learning framework that integrates textual and visual data to detect potential wildlife trade on YouTube. By combining natural language processing with deep learning for image analysis, and filtering millions of posts down to those most relevant, our method significantly improves detection accuracy. This dual-layered approach uncovered thousands of posts featuring hundreds of species, many of which are threatened. This work demonstrates how advances in machine learning can support ecological monitoring and conservation by providing timely, data-driven, insights into online trade networks. In the pursuit of reducing biodiversity loss, this study offers an approach for bridging the gap between online behavior and real-world ecological outcomes. HighlightsO_LIIntroduces a multimodal content analysis approach for detecting wildlife trade on YouTube by integrating textual and visual data. C_LIO_LIA multimodal filtering technique reduces irrelevant text and video content, enhancing analytical efficiency. C_LIO_LIA decision fusion module then combines results from text and video filtering improving wildlife trade detection accuracy. C_LIO_LIThe proposed methods are applicable across multiple online platforms and suitable for diverse tasks in ecology and biodiversity conservation. C_LI

8
Assessing the quality of generative artificial intelligence for science communication in environmental research

Worden, D.; Richards, D.

2024-11-13 scientific communication and education 10.1101/2024.11.11.623072 medRxiv
Top 0.1%
22.4%
Show abstract

The adoption of Generative Artificial Intelligence (GenAI) tools is drastically changing the way that researchers work. While debate on the quality of GenAI outputs continues, there is optimism that GenAI may help human experts to address the most significant environmental challenges facing society. No previous research has quantitatively assessed the quality of GenAI outputs intended to inform environmental management decisions. Here we surveyed 98 environmental scientists and used their expertise to assess the quality of human and GenAI content relevant to their discipline. We analysed the quality and relative preference between human and GenAI content across three use cases in environmental science outreach and communication. Our results indicate that the GenAI content was generally deemed adequate in quality by human experts, with an average of 82% of respondents indicating a quality of "adequate" or better across the three use cases. Respondents exhibited strong preferences for GenAI over human-only content when using GenAI imageery of future park management scenarios. For the use cases of generating a wetland planting guide and answering a question about invasive species management, preferences were heterogeneous amongst respondents. Our findings raise substantive questions about GenAI content as a complement to human expertise when research is transferred to public audiences.

9
A supervised learning algorithm to evaluate occurrence records in virtual species

Rios, R.; Noguera-Urbano, E. A.; Espinosa, J.; Ochoa, J. M.

2021-09-06 ecology 10.1101/2021.09.06.459158 medRxiv
Top 0.1%
22.3%
Show abstract

Digital and open access of occurrence data have encouraged the development of tools to improve biodiversity conservation and management. In this study, we proposed a methodology to evaluate point-occurrence records based on expert knowledge. We firstly generated virtual data to test our methodology without confounding factors by simulating geographical distributions, virtual sampling, and expert checking of occurrence records. We used a set of non-linear bioclimatic variables and principal component analysis (PCA) to define a duality function between niche and biotope spaces. Subsequently, a supervised-learning model was fit to classify records between true and doubtful presence based on the virtual expert checking. We then tested our methodology using three virtual species and 10-fold cross validation. Also, we evaluated the prediction performance of the supervise model compared with the virtual observer using a virtual external database of occurrence data.

10
The Freshwater Sounds Archive

Greenhalgh, J. A.; Akmentins, M.; Boullhesen, M.; Brejao, G. L.; Bowman, J. C.; Briers, R. A.; Campbell, k.; Clark, A.; Coen, M.; Desjonqueres, C.; Gaston, S.; Gottesman, B. L.; Jones, I. T.; Lahoz-Monfort, J. J.; Lindsay, E.; Rodriguez, F. M.; Navarrete-Mier, F.; Norton, M.; Las Casas e Novaes, M. C.; Okazaki, S.; Polajnar, J.; Ribeiro, M. C.; Roberts, L.; Rothenberg, D.; Sabet, S. S.; Satish, R.; Spriel, B.; Stankovic, D.; Velde, K. t.; Timperley, J. H.; Turlington, K.; Walker, J. R.; Valverde, M. P.; Cox, K.; Looby, A.

2025-05-11 ecology 10.1101/2025.05.07.652412 medRxiv
Top 0.1%
19.8%
Show abstract

Freshwater ecosystems are full of underwater sounds produced by amphibians, aquatic arthropods, reptiles, plants, fishes, and methane bubbles escaping from the sediment. Although much headway has been made in recent years investigating the overall soundscapes of various freshwater ecosystems around the world, there remains a significant knowledge gap in our collective inability to accurately and reliably link recorded sounds with the species that produced them. Here, we present The Freshwater Sounds Archive, a new global initiative, which seeks to address this knowledge gap by collating species-specific freshwater sound recordings into a publicly available database. By means of metadata collection, we also present a snapshot of the species studied, the recording equipment, and recording parameters used by freshwater ecoacousticians globally. In total, 61 entries were submitted to the archive between the 4th of March 2023 and the 30th of April 2025, representing 16 countries and 6 continents. The most numerous taxonomic group was arthropods (29 entries), followed by fishes (14 entries), amphibians (10 entries), macrophytes (7 entries), and a freshwater mollusk (1 entry). The majority of the submissions were from European countries (27 entries), of which the United Kingdom was the most represented with 14 entries. The next most represented region was North America (11 entries), followed by South America (8 entries), Oceania and Asia (5 entries each), Africa (3 entries), and the Middle East and Central America with 1 entry each. The global south, polar regions, and areas with an elevation >500 m (asl) were underrepresented. The field of freshwater ecoacoustics to date has largely focused on the analysis of sound types due to a current lack of knowledge of species-specific sounds. The Freshwater Sounds Archive presents an opportunity to move beyond the sound type approach, and towards an approach with higher taxonomic resolution, ultimately resulting in species-specific descriptions. Furthermore, The Freshwater Sounds Archive will provide freshwater ecoacousticians with one of the main tools required to start creating annotated training datasets for machine learning models from soundscape recordings by referring to known species sounds present in the archive. In the long-term, this will result in the automatic detection and classification of species-specific freshwater sounds from soundscape recordings, such as indicator, invasive, and endangered species.

11
The impacts of transfer learning, phylogenetic distance, and sample size on big-data bioacoustics

Provost, K. L.; Yang, J.; Carstens, B. C.

2022-04-18 evolutionary biology 10.1101/2022.02.24.481827 medRxiv
Top 0.1%
19.7%
Show abstract

Vocalizations in animals, particularly birds, are critically important behaviors that influence their reproductive fitness. While recordings of bioacoustic data have been captured and stored in collections for decades, the automated extraction of data from these recordings has only recently been facilitated by artificial intelligence methods. These have yet to be evaluated with respect to accuracy of different automation strategies and features. Here, we use a recently published machine learning framework to extract syllables from ten bird species ranging in their phylogenetic relatedness from 1 to 85 million years, to compare how phylogenetic relatedness influences accuracy. We also evaluate the utility of applying trained models to novel species. Our results indicate that model performance is best on conspecifics, with accuracy progressively decreasing as phylogenetic distance increases between taxa. However, we also find that the application of models trained on multiple distantly related species can improve the overall accuracy to levels near that of training and analyzing a model on the same species. When planning big-data bioacoustics studies, care must be taken in sample design to maximize sample size and minimize human labor without sacrificing accuracy.

12
Automated community ecology using deep learning: a case study of planktonic foraminifera

Hsiang, A. Y.; Hull, P. M.

2022-11-01 ecology 10.1101/2022.10.31.514514 medRxiv
Top 0.1%
18.9%
Show abstract

The development of deep learning methods using convolutional neural networks (CNNs) has revolutionised the field of computer vision in recent years. The automation of taxonomic identification using CNNs leads naturally to the use of such technology for rapidly generating large organismal datasets in order to study the evolutionary and ecological dynamics of biological communities across time and space. While CNNs have been used to train machine learning classifiers that can identify organisms to the species level for several groups, this vision of automated community ecology has yet to be thoroughly tested or fulfilled. Here, we present a case study of automated community ecology using a large dataset of Atlantic planktonic foraminifera for which the generation of species labels and morphometric measurements was completely automated. We compare standard community diversity metrics between the fully automated dataset and a "traditional" dataset with human-identified specimens. We show that there is high congruence between the results, and that machine classifications help avoid biases that can result in the inference of misleading biodiversity patterns. Our study demonstrates the viability and potential of fully automated community ecology and sets the stage for a new era of ecological and evolutionary inquiry driven by artificial intelligence.

13
The potential applications of high-resolution 3D scanners in the taxonomic classification of insects

Peacock, C. J.; Evans, W.; Goodman, S. J.; Hassall, C.

2024-06-18 scientific communication and education 10.1101/2024.06.17.599367 medRxiv
Top 0.1%
18.8%
Show abstract

The phenotypic classification of small biological specimens, such as insects, can be dependent on phenotypic features that are difficult to observe and communicate to others. Here, we evaluate how high-resolution 3D photogrammetric scanner technology can potentially allow such features to be resolved and visualised as a 3D models, which can then be shared as a taxonomical resource for species identification, as virtual type specimens, and for educational and public engagement purposes. We test the viability and limitations of this approach using specimens digitised with a Artec Micro scanner. Ten samples from unique species were mounted and scanned. The model outputs were evaluated against an identification key, which compiled diagnostic features for the specimens from the wider literature, to describe the specimens to the lowest taxonomic level possible. The results showed that six of the ten specimens could be identified to species level using the scans. Threshold values for body length and width were 10.7 mm and 4.4 mm respectively. Below these body dimensions important diagnostic features of specimens could not be resolved reliably. This result suggests that with current technology, 3D photogrammetric modelling is a viable method for taxonomic identification of a wide range of insect groups with larger body sizes. This approach opens up novel applications for species identification and data sharing among taxonomists, international field research, conservation efforts, and entomological outreach. However, the limitations of this approach to taxonomic identification must be considered depending upon the size of the specimen and its diagnostic features. Future developments in the technology and processing methods used may alleviate the constraints on body size exhibited in this study, widening the applications for smaller bodied specimens.

14
Automatic parameter estimation and detection of ringed seal knocking vocalizations

Solana, A.; Young, M.; Nadeu, C.; Kunnasranta, M.; Houegnigan, L.

2026-01-29 ecology 10.1101/2024.05.06.592639 medRxiv
Top 0.1%
18.7%
Show abstract

Passive acoustic monitoring is a valuable tool for studying elusive marine mammals, but analyzing large datasets is typically labor-intensive and costly. In this study, we piloted an automatic approach for sound analysis on extensive datasets of acoustic underwater recordings from freshwater Lake Saimaa over a total of 12 months. Our focus was on "knocking" vocalizations, the most commonly found call type of the endangered Saimaa ringed seal (Pusa saimensis). The annotated datasets of knock sounds (n = 13,179) were used to train and test binary classification systems to detect this sound type. In addition, the fundamental frequencies of the vocalizations were automatically estimated by an ensemble of methods and corroborated by recent literature. The best classifier was a spectrogram-based convolutional neural network that achieved a minimum F1-score of 97.76% on unseen samples from each dataset, demonstrating its ability to detect knockings amongst noise and other events. Moreover, the estimated fundamental frequencies are comparable to the ones manually computed for the same datasets. These automated approaches can significantly reduce labor and costs associated with manual analysis, making long-term species monitoring more feasible and efficient.

15
Bioacoustics for in situ validation of species distribution modelling: An example with bats in Brazil

Hintze, F.; Machado, R. B.; Bernard, E.

2021-03-08 ecology 10.1101/2021.03.08.434378 medRxiv
Top 0.1%
18.6%
Show abstract

Species distribution modelling (SDM) gained importance on biodiversity distribution and conservation studies worldwide, including prioritizing areas for public policies and international treaties. Useful for large-scale approaches and estimates, is a plus considering that a minor fraction of the planet is adequately sampled. However, SDM needs to be as reliable as possible. Minimizing errors is challenging, but essential, considering the uses and consequences of such models. In situ validation of the SDM outputs should be a key-step - in some cases, urgent. Bioacoustics can be used to validate and refine those outputs, especially if the focal species vocalizations are conspicuous and species-specific. This is the case of echolocating bats. Here, we used extensive acoustic monitoring (>120 validation points, covering >758,000 km2, and >300,000 sound files) to validate MaxEnt outputs for six neotropical bat species in a poorly-sampled region of Brazil. Based on in situ validation, we evaluated four threshold-dependent theoretical evaluation metrics ability in predicting models performance. We also assessed the performance of three widely used thresholds to convert continuous SDMs into presence/absence maps. We demonstrated that MaxEnt produces very different outputs, requiring a careful choice on thresholds and modeling parameters. Although all theoretical evaluation metrics studied were positively correlated with accuracy, we empirically demonstrated that metrics based on specificity-sensitivity and sensitivity-precision are better for testing models, considering that most SDMs are based on unbalanced data. Without independent field validation, we found that using an arbitrary threshold for modelling can be a precarious approach with many possible outcomes, even after getting good evaluation scores. Bioacoustics proved to be important for validating SDMs for the six bat species analyzed, allowing a better refinement of SDMs in large and under-sampled regions, with relatively low sampling effort. Regardless of species assessing method used, our research highlighted the vital necessity of in situ validation for SDMs.

16
Revisiting giraffe photo-identification using deep learning and network analysis

Miele, V.; Dussert, G.; Spataro, B.; Chamaille-Jammes, S.; Allaine, D.; Bonenfant, C.

2020-03-25 ecology 10.1101/2020.03.25.007377 medRxiv
Top 0.1%
18.5%
Show abstract

An increasing number of ecological monitoring programs rely on photographic capture-recapture of individuals to study distribution, demography and abundance of species. Photo-identification of individuals can sometimes be done using idiosyncratic coat or skin patterns, instead of using tags or loggers. However, when performed manually, the task of going through photographs is tedious and rapidly becomes too time consuming as the number of pictures grows. Computer vision techniques are an appealing and unavoidable help to tackle this apparently simple task in the big-data era. In this context, we propose to revisit animal re-identification using image similarity networks and metric learning with convolutional neural networks (CNNs), taking the giraffe as a working example. We first developed an end-to-end pipeline to retrieve a comprehensive set of re-identified giraffes from about 4, 000 raw photographs. To do so, we combined CNN-based object detection, SIFT pattern matching, and image similarity networks. We then quantified the performance of deep metric learning to retrieve the identity of known individuals and detect unknown individuals never seen in the previous years of monitoring. After a data augmentation procedure, the re-identification performance of the CNN reached a Top-1 accuracy of about 90%, despite the very small number of images per individual in the training data set. While the complete pipeline succeeded in re-identifying known individuals, it slightly under-performed with unknown individuals. Fully based on open-source software packages, our work paves the way for further attempts to build automatic pipelines for re-identification of individual animals, not only in giraffes but also in other species.

17
Monitoring photogenic ecological phenomena: Social network site images reveal spatiotemporal phases of Japanese cherry blooms

ElQadi, M. M.; Dyer, A. G.; Vlasveld, C.; Dorin, A.

2021-09-13 ecology 10.1101/2021.09.13.460016 medRxiv
Top 0.1%
18.1%
Show abstract

Some ecological phenomena are visually engaging and widely celebrated. Consequently, these have the potential to generate large footprints in the online and social media image records which may be valuable for ecological research. Cherry tree blooms are one such event, especially in Japan where they are a cultural symbol (Sakura, ). For centuries, the Japanese have celebrated Hanami (flower viewing) and the historical data record of the festival allows for phenological studies over this period, one application of which is climate reconstruction. Here we analyse Flickr social network site data in an analogous way to reveal the cherry blossoms seasonal sweep from southern to northern Japan over a twelve-week period. Our method analyses data filtered using geographical constraints, multi-stage text-tag classification, and machine vision, to assess image content for relevance to our research question and use it to estimate historic cherry bloom times. We validated our estimated bloom times against official data, demonstrating the accuracy of the approach. We also investigated an out of season Autumn blooming that has gained worldwide media attention. Despite the complexity of human photographic and social media activity and the relatively small scale of this event, our method can reveal that this bloom has in fact been occurring over a decade. The approach we propose in our case study enables quick and effective monitoring of the photogenic spatiotemporal aspects of our rapidly changing world. It has the potential to be applied broadly to many ecological phenomena of widespread interest.

18
Modelling the niches of wild and domesticated Ungulatespecies using deep learning

Rademaker, M.; Hogeweg, L.; Vos, R.

2019-08-22 evolutionary biology 10.1101/744441 medRxiv
Top 0.1%
18.0%
Show abstract

Knowledge of global biodiversity remains limited by geographic and taxonomic sampling biases. The scarcity of species data restricts our understanding of the underlying environmental factors shaping distributions, and the ability to draw comparisons among species. Species distribution models (SDMs) were developed in the early 2000s to address this issue. Although SDMs based on single layered Neural Networks have been experimented with in the past, these performed poorly. However, the past two decades have seen a strong increase in the use of Deep Learning (DL) approaches, such as Deep Neural Networks (DNNs). Despite the large improvement in predictive capacity DNNs provide over shallow networks, to our knowledge these have not yet been applied to SDM. The aim of this research was to provide a proof of concept of a DL-SDM1. We used a pre-existing dataset of the worlds ungulates and abiotic environmental predictors that had recently been used in MaxEnt SDM, to allow for a direct comparison of performance between both methods. Our DL-SDM consisted of a binary classification DNN containing 4 hidden layers and drop-out regularization between each layer. Performance of the DL-SDM was similar to MaxEnt for species with relatively large sample sizes and worse for species with relatively low sample sizes. Increasing the number of occurrences further improved DL-SDM performance for species that already had relatively high sample sizes. We then tried to further improve performance by altering the sampling procedure of negative instances and increasing the number of environmental predictors, including species interactions. This led to a large increase in model performance across the range of sample sizes in the species datasets. We conclude that DL-SDMs provide a suitable alternative to traditional SDMs such as MaxEnt and have the advantage of being both able to directly include species interactions, as well as being able to handle correlated input features. Further improvements to the model would include increasing its scalability by turning it into a multi-classification model, as well as developing a more user friendly DL-SDM Python package.

19
RUSBoost: A suitable species distribution method for imbalanced records of presence and absence. A case study of twenty-five species of Iberian bats

Carrasco, J.; Lison, F.; Weintraub, A.

2021-10-09 ecology 10.1101/2021.10.06.463434 medRxiv
Top 0.1%
17.8%
Show abstract

O_LITraditional Species Distribution Models (SDMs) may not be appropriate when examples of one class (e.g. absence or pseudo-absences) greatly outnumber examples of the other class (e.g. presences or observations), because they tend to favor the learning of observations more frequently. C_LIO_LIWe present an ensemble method called Random UnderSampling and Boosting (RUSBoost), which was designed to address the case where the number of presence and absence records are imbalanced, and we opened the "black-box" of the algorithm to interpret its results and applicability in ecology. C_LIO_LIWe applied our methodology to a case study of twenty-five species of bats from the Iberian Peninsula and we build a RUSBoost model for each species. Furthermore, in order to improve to build tighter models, we optimized their hyperparameters using Bayesian Optimization. In particular, we implemented a objective function that represents the cross-validation loss: [Formula], with [Formula] representing the hyper-parameters Maximum Number of Splits, Number of Learners and Learning Rate. C_LIO_LIThe models reached average values for Area Under the ROC Curve (AUC), specificity, sensitivity, and overall accuracy of 0.84 {+/-} 0.05%, 79.5 {+/-} 4.87%, 74.9 {+/-} 6.05%, and 78.8 {+/-} 5.0%, respectively. We also obtained values of variable importance and we analyzed the relationships between explanatory variables and bat presence probability. C_LIO_LIThe results of our study showed that RUSBoost could be a useful tool to develop SDMs with good performance when the presence/absence databases are imbalanced. The application of this algorithm could improve the prediction of SDMs and help in conservation biology and management. C_LI

20
Is the replication crisis a problem for biologists? A geometric morphometric approach.

Vrdoljak, J.; Sanchez, K. I.; Arreola-Ramos, R.; Diaz Huesa, E. G.; Villagra, A.; Avila, L. J.; Morando, M.

2019-12-02 scientific communication and education 10.1101/862052 medRxiv
Top 0.1%
17.7%
Show abstract

Replicability of findings is the key factor of scientific reliability. However, literature on this topic is scarce and apparently taboo for large scientific areas. Some authors named the failure to reproduce scientific findings replication crisis. Geometric morphometrics, a vastly used technique, is especially silent on replication crisis concern. Nevertheless, some works pointed out that sharing morphogeometric information is not a trivial fact, but need to be careful and meticulous. Here, we investigated the replicability of geometric morphometrics protocols on complex shapes and measurement error extension in three different types of taxa, as well as the potentiality of these protocols to discriminate among closely related species. We found a wide range of replication error that contributed from 19.5% to 60% of the total variation. Although, measurement error decreased with the complexity of the quantified shape, it often maintained high values. All protocols were able to discriminate between species, but more morphogeometric information does not imply better performance. We present evidence of replication crisis in life sciences and highlight the need to explore in deep different sources of variation that could lead to low replicability findings. Lastly, we enunciate some recommendations in order to improve the replicability and reliability of scientific findings.